MEDB 5505, Module01

2025-01-23

Topics to be covered

  • What you will learn
    • History of R
    • Installing R
    • Objects in R
    • Anatomy of a small R program
    • Live demonstration
    • Good programming practices
    • Your programming assignment

Special note

  • This slide show was created using R.

History of R

This portion of the talk will be found in the file history-of-r.pptx.

Break #1

  • What you have learned
    • History of R
  • What’s coming next
    • Installing R

Installing R (https://cran.r-project.org/)

Screenshot of webpage for installation of R

Installing RStudio (https://rstudio.com/)

Screenshot of main page for RStudio

Installing RStudio (https://rstudio.com/)

Screenshot of products of RStudio

Installing R and R Studio

  • R is required
  • RStudio is strongly recommended
  • Do not delay in getting this software installed
  • Find me if you have ANY problems

“A place for everything, everything in its place”

  • data, for raw/intermediate data files
  • doc, for documentation
  • images for graphs/illustrations
  • results, for program output
  • src, for program code
  • Other folders as needed

Break #2

  • What you have learned
    • Installing R
  • What’s coming next
    • Objects in R

Introduction

This is a very brief introduction to the basic objects in R.

R.version.string
[1] "R version 4.4.1 (2024-06-14 ucrt)"
Sys.Date()
[1] "2025-01-24"

Assignment and naming conventions

  • Use <- and -> to assign objects to a name
    • Avoid using = for an assignment
  • Rules for names
    • Combination of letters and numbers
    • No spaces
    • No symbols other than underscore (_) and dot (.)
    • Cannot start with a number
      • a1 is okay, but 1a is not

Recommendations for names, 1

  • Avoid generic names (x,y or v1, v2, v3)
  • Don’t run two words together (writersexchange)
  • Use short words separated by underscores (writers_exchange)
    • All lower case
    • No abbreviations
    • Avoid names identical with common functions

Recommendation for names, 2

  • Common alternatives to the underscore separator
    • Short words separated by dots (writers.exchage)
    • Start each word with capital (WritersExchange)
    • Use dash instead of underscore for file names, chunk names (writers-exchange)
  • What should YOU use
    • Now: Anything is fine, just be consistent
    • Later: See if there is an official or unofficial company standard

Functions

sqrt(3)
[1] 1.732051
sqrt(1:5)
[1] 1.000000 1.414214 1.732051 2.000000 2.236068
barplot(height=5:1, names=1:5)

Nested functions and pipes

x <- 0.9
y <- asin(sqrt(x))
y
[1] 1.249046
x |>
  sqrt() |>
  asin() -> y
y
[1] 1.249046

Named arguments in functions

qnorm(p=0.99, mean=100, sd=15)
[1] 134.8952
qnorm(0.99, 100, 15)
[1] 134.8952
qnorm(0.99)
[1] 2.326348

Scalars

scalar_example_1 <- 3
scalar_example_1
[1] 3
scalar_example_2 <- "R"
scalar_example_2
[1] "R"
scalar_example_3 <- "3"
scalar_example_3
[1] "3"

Vectors

vector_example_1 <- c(1, 2, 3)
vector_example_1
[1] 1 2 3
vector_example_2 <- c("a", "b", "c")
vector_example_2
[1] "a" "b" "c"
vector_example_3 <- c("a", 2)
vector_example_3
[1] "a" "2"

Naming vectors

my_degrees <- c(
  BA=1977, 
  MS=1978, 
  PhD=1982)
my_degrees
  BA   MS  PhD 
1977 1978 1982 
my_name <- c(
  first_name="Stephen", 
  middle_initial="D", 
  last_name="Simon")
my_name
    first_name middle_initial      last_name 
     "Stephen"            "D"        "Simon" 

Matrices using cbind and rbind functions

matrix_example_1 <- 
  cbind(
    c(1, 2, 3), 
    c(4, 5, 6))
matrix_example_1
     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6
matrix_example_2 <- 
  rbind(
    c(1, 2, 3), 
    c(4, 5, 6))
matrix_example_2
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6

Matrices using the matrix function

matrix_example_3 <- 
  matrix(
    c(1, 2, 3, 4, 5, 6), 
    nrow=2, 
    ncol=3, 
    byrow=TRUE)
matrix_example_3
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6

Lists

list_example_1 <- 
  list(
    scalar_example_1, 
    vector_example_2, 
    matrix_example_3)
list_example_1
[[1]]
[1] 3

[[2]]
[1] "a" "b" "c"

[[3]]
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6

Lists using names

list_example_2 <- 
  list(
    name=my_name, 
    degrees=my_degrees, 
    age=64)
list_example_2
$name
    first_name middle_initial      last_name 
     "Stephen"            "D"        "Simon" 

$degrees
  BA   MS  PhD 
1977 1978 1982 

$age
[1] 64

Data frames

data_frame_example_1 <- 
  data.frame(
    vector_example_1, 
    vector_example_2)
data_frame_example_1
  vector_example_1 vector_example_2
1                1                a
2                2                b
3                3                c

Naming data frame columns

data_frame_example_2 <- 
  data.frame(
    c(1, 2, 3), 
    c("a", "b", "c"))
data_frame_example_2
  c.1..2..3. c..a....b....c..
1          1                a
2          2                b
3          3                c
data_frame_example_3 <- 
  data.frame(
    small_numbers=c(1, 2, 3), 
    early_letters =c("a", "b", "c"))
data_frame_example_3
  small_numbers early_letters
1             1             a
2             2             b
3             3             c

Tibbles

library(tidyverse)

tibble_example_1 <- 
  tibble(
    x=c(1, 2, 3),
    y=c("a", "b", "c"))
tibble_example_1
# A tibble: 3 × 2
      x y    
  <dbl> <chr>
1     1 a    
2     2 b    
3     3 c    

Vector or tibble?

sample_vector <- 1:5
sample_vector
[1] 1 2 3 4 5
sample_tibble <- tibble(sample_vector)
sample_tibble
# A tibble: 5 × 1
  sample_vector
          <int>
1             1
2             2
3             3
4             4
5             5

Break #3

  • What you have learned
    • Objects in R
  • What’s coming next
    • Anatomy of a small R program

Anatomy of a small R program, overview

YAML header

---
title: "Illustrating the structure of an R program"
editor: source
format: 
  html:
    embed-resources: true
execute: 
  error: true
---

First comment


This program was written by Steve Simon  and created on 2019-01-28 with a major
revision on 2024-12-27. It is used to illustrate the structure of an R program. 
This program is in the public domain. You can use it any way that you please.

First code chunk

```{r}
#| label: setup
#| message: false
#| warning: false

R.version.string
Sys.Date()
library(tidyverse)
```

Second comment


Read data from the aids-cases text file. This file is described at

https://github.com/pmean/data/blob/main/files/aids-cases.yaml

Second code chunk

```{r}
#| label: read-text-file

aids_cases <- read_csv(
  file="../data/aids-cases.csv",
  col_types="nnn")
glimpse(aids_cases)
```

Third comment


This is a small dataset with only three variables. Now let's draw a line graph.

Third code chunk

```{r}
#| label: line-graph

aids_cases |>
  ggplot() +
    aes(yr, nsw) +
    geom_line()
```

Fourth comment


There is an increasing trend in aids cases in New South Wales over time.

Anatomy of a small program, review

Output, overview

Output, part 1

Output, part 2

Output, part 3

Suggestions for nice looking comments, 1

  • Quarto (and Rmarkdown) use tagged text files
    • Based on Markdown
    • Easy to remember
    • Easy read in its raw form
    • Use any program that edits text files

Suggestions for nice looking comments, 2

  • Interface with Pandoc to convert to (and from)
    • Microsoft Word, Powerpoint
    • Html files
    • PDF files

Suggestions for nice looking comments, 3

  • Start line with ## for headlines
  • Start lines with -, +, or * for bulleted lists
    • Indent for sub bullets
  • Surround text with ** for bold
  • Surround text with $ for Greek letters (\(\mu\)) and math symbols (\(\sqrt{2}\))
  • Use [] for hyperlinks

Many more in quarto guide

An example of raw Markdown codes

## Suggestions for nice looking comments

-   Start line with ## for headlines
-   Start lines with -, +, or * for bulleted lists
    -   Indent for sub bullets
-   Surround text with ** for **bold**
-   Surround text with $ for Greek letters ($\mu$) and math symbols ($\sqrt{2}$) 
-   Use [] for hyperlinks

Many more in [quarto guide][ref43]

[ref43]: https://quarto.org/docs/authoring/markdown-basics.html

Break #4

  • What you have learned
    • Anatomy of a small R program
  • What’s coming next
    • Live demonstration

Live demonstration of running R

In this segment, you will see a live demonstration running the program simon-5505-01-demo.qmd.

Break #5

  • What you have learned
    • Live demonstration
  • What’s coming next
    • Good programming practices

General requirements for any program

There are standards in six areas:

  • Documentation
  • Graphs
  • Tables
  • Readability
  • Interpretation
  • Conciseness

There may be times when one or two of these standards do not apply. Which standards apply and which don’t should be obvious from the nature of the programming assignment. Ask if you are unsure what is required.

Documentation is required!

Documentation should include

  • the name of the author (you!),
  • the creation date,
  • the purpose of your program, and
  • any restrictions on use (your choice).
    • Public domain (no restrictions)
    • Specific restrictions on how others can use your program

Graphs cannot rely on default choices, 1

Always modify your graphs. Do not settle for the default options.

  • Include your name and date on the title of any graph
    • “Steve Simon produced this graph on 2023-09-19.”
  • Avoid the display of unnecessary decimal places on the axes
  • Use comma separators for large numbers
  • Replace category codes with descriptive labels

Graphs cannot rely on default choices, 2

  • Replace short variable names with longer descriptors
    • Include units of measurement, if needed
  • Avoid the gratuitous use of color
    • Unless needed to distinguish between groups
    • Fill boxes and points with white/transparent colors

Tables also need modification

  • Round to two or three significant figures
  • Use comma separators if numbers are >= 1,000
  • Avoid scientific notation (e.g., 1.23E-04)
  • Avoid small p-values (e.g., p=0.000)
    • Change to p<0.001
  • Suppress the printing of unneeded tables
    • Sometimes difficult

Sometimes default tables/graphs acceptable

  • Early assignments may ask for defaults
  • Always round and specify units in your interpretations

Your code must be easy to read

  • Make liberal use of
    • blank lines
    • line breaks
    • indenting
    • vertical lists

Always include an interpretation

  • Use simple evaluative words
    • Young/Elderly
    • Less than half/more than half
    • Almost all/almost none
    • Substantial improvement/roughly comparable
  • Depends on context
    • No penalty for subjective judgments

Conciseness

  • Do not include analyses that were not asked for
  • Avoid displaying excessively large tables
    • This may be difficult for SAS and SPSS

Data dictionary

If you use a data set that you found on your own rather than one that your instructors provided, you must include a data dictionary. The elements of a data dictionary should include:

  • Source
  • Description
  • Copyright
  • Size
  • Variables

Data dictionary: source

  • Where did you find the data
    • Website link
    • Formal reference (if available)

Include a complete URL, except if your data is behind a paywall. If your data is associated with a peer-reviewed publication, provide a formal reference to that publication.

Data dictionary: Description

Provide a few sentences explaining the context of your data. Explain how the data was collected and what it is being used for.

  • Open source license
    • Use the data with just a few restrictions

Data dictionary: Size

  • Number of rows (excluding a header row)
  • Number of columns

Data dictionary: Variables

  • Name
  • Label
  • Units of measure

Data dictionary: Variable scale

  • Scale
    • Nominal
    • Ordinal
    • Interval
    • Ratio

Data dictionary: Variable range

  • Range
    • Non-negative (>= 0)
    • Positive (> 0)
    • Upper bound, if any

Data dictionary: Variable type

  • Type
    • Integer
    • Float
    • Character

File details

This file was written by Steve Simon on 2024-12-26. It is in the public domain and you can use it any way you please.

The format function

for (i in seq(2, 10, by=2)) {
    x <- factorial(i)
    y <- format(x, big.mark=",")
  print(glue("{i}! = {y}"))
}
2! = 2
4! = 24
6! = 720
8! = 40,320
10! = 3,628,800

The round function for large numbers

for (i in seq(2, 10, by=2)) {
    x <- factorial(i)
    y <- round(x, digits=-2)
  print(glue("{i}! is approximately {y}"))
}
2! is approximately 0
4! is approximately 0
6! is approximately 700
8! is approximately 40300
10! is approximately 3628800

The signif function for large numbers

for (i in seq(2, 10, by=2)) {
    x <- factorial(i)
    y <- signif(x, digits=2)
  print(glue("{i}! is approximately {y}"))
}
2! is approximately 2
4! is approximately 24
6! is approximately 720
8! is approximately 40000
10! is approximately 3600000

The round function for small numbers

for (i in seq(2, 10, by=2)) {
    x <- 0.5^i
    y <- round(x, digits=2)
  print(glue("0.5^{i} is approximately {y}"))
}
0.5^2 is approximately 0.25
0.5^4 is approximately 0.06
0.5^6 is approximately 0.02
0.5^8 is approximately 0
0.5^10 is approximately 0

The signif function for small numbers

for (i in seq(2, 10, by=2)) {
    x <- 0.5^i
    y <- signif(x, 2)
  print(glue("0.5^{i} is approximately {y}"))
}
0.5^2 is approximately 0.25
0.5^4 is approximately 0.062
0.5^6 is approximately 0.016
0.5^8 is approximately 0.0039
0.5^10 is approximately 0.00098

Break #6

  • What you have learned
    • Good programming practices
  • What’s coming next
    • Your programming assignment

Program

  • Download the demo program from module01
    • Store it in your src folder
  • Modify the file name
    • Use your last name instead of “simon”
    • Change “demo” to “aids-cases”
  • Modify the documentation headers
    • Add your name
    • Optional: change the copyright statement

Data

Question 1

Calculate the minimum and maximum number of AIDS cases in Victoria from 1982 to 1987. The default format for this table is acceptable. Provide a brief interpretation.

Question 2

Graph the trend in AIDS cases in Victoria from 1982 to 1987. Use a nice format and provide a brief interpretation.

Important reminder

Keep your output brief. After you have gotten your program to work, remove any code that does not address the questions above. Also, please remove any “Comments on the code” sections. But do provide interpretations when asked.

Grading rubric

You will be evaluated using the general grading rubric for programming assignments.

Your submission

  • Save the output in html format
  • Convert it to pdf format.
  • Make sure that the pdf file includes
    • Your last name
    • The number of this course
    • The number of this module
  • Upload the file

If it doesn’t work

Please review the suggestions if you encounter an error page.

File details

This programming assignment was written by Steve Simon on 2024-12-18 and is placed in the public domain.

Summary

  • What you have learned
    • History of R
    • Installing R
    • Objects in R
    • Anatomy of a small R program
    • Live demonstration
    • Good programming practices
    • Your programming assignment